Add config file for fused MoE for Nemotron (TP4, B200) by danisereb · Pull Request #34411 · vllm-project/vllm

danisereb · 2026-02-12T09:18:48Z

Purpose

Tune MoE config for Nemotron-H model on B200 with TP4 (using benchmark_moe script).

Test Plan

Compare performance with/without the JSON file using vllm bench serve.

Test Result

ISL 1024, OSL 1024

Batch size	Output tok/s	Output tok/s with JSON	Perf gain
64	1449.72	1647.53	13.65%
256	2954.71	3361.69	13.77%
512	3635.13	4295.97	18.18%

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

gemini-code-assist

Code Review

This pull request introduces a new configuration file with tuned parameters for the fused Mixture of Experts (MoE) kernel. The configuration is specifically for the Nemotron-H model on NVIDIA B200 GPUs with a tensor parallelism of 4. The change is straightforward, adding a data file for performance optimization. The file format and naming convention are consistent with the existing structure. The changes look good and I don't see any issues.

Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>

mgoin · 2026-02-12T12:51:31Z

Thanks!

gemini-code-assist Bot reviewed Feb 12, 2026

View reviewed changes

Add fused MoE tuned JSON for TP4 on B200

0640ece

Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>

danisereb force-pushed the tune_moe branch from 11f0530 to 0640ece Compare February 12, 2026 09:54

danisereb marked this pull request as ready for review February 12, 2026 09:54

danisereb requested review from mgoin and pavanimajety as code owners February 12, 2026 09:54

mgoin approved these changes Feb 12, 2026

View reviewed changes

mgoin enabled auto-merge (squash) February 12, 2026 12:51

github-actions Bot added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 12, 2026

vllm-bot merged commit dea6351 into vllm-project:main Feb 12, 2026
54 of 60 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add config file for fused MoE for Nemotron (TP4, B200)#34411

Add config file for fused MoE for Nemotron (TP4, B200)#34411
vllm-bot merged 1 commit intovllm-project:mainfrom
de-inf:tune_moe

danisereb commented Feb 12, 2026 •

edited by github-actions Bot

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

mgoin commented Feb 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

danisereb commented Feb 12, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

mgoin commented Feb 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

danisereb commented Feb 12, 2026 •

edited by github-actions Bot

Loading